NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Database Benchmarking for Supporting Real-Time Interactive Querying of Large Data

https://doi.org/10.1145/3318464.3389732

Battle, Leilani; Eichmann, Philipp; Angelini, Marco; Catarci, Tiziana; Santucci, Giuseppe; Zheng, Yukun; Binnig, Carsten; Fekete, Jean-Daniel; Moritz, Dominik (June 2021, Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data)
null (Ed.)
In this paper, we present a new benchmark to validate the suitability of database systems for interactive visualization workloads. While there exist proposals for evaluating database systems on interactive data exploration workloads, none rely on real user traces for database benchmarking. To this end, our long term goal is to collect user traces that represent workloads with different exploration characteristics. In this paper, we present an initial benchmark that focuses on "crossfilter"-style applications, which are a popular interaction type for data exploration and a particularly demanding scenario for testing database system performance. We make our benchmark materials, including input datasets, interaction sequences, corresponding SQL queries, and analysis code, freely available as a community resource, to foster further research in this area: https://osf.io/9xerb/?view_only=81de1a3f99d04529b6b173a3bd5b4d23.
more » « less
Full Text Available
Chiller: Contention-centric Transaction Execution and Data Partitioning for Modern Networks

https://doi.org/10.1145/3318464.3389724

Zamanian, Erfan; Shun, Julian; Binnig, Carsten; Kraska, Tim (May 2020, Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD))
null (Ed.)
Full Text Available
Democratizing Data Science through Interactive Curation of ML Pipelines

https://doi.org/10.1145/3299869.3319863

Shang, Zeyuan; Zgraggen, Emanuel; Buratti, Benedetto; Kossmann, Ferdinand; Eichmann, Philipp; Chung, Yeounoh; Binnig, Carsten; Upfal, Eli; Kraska, Tim (January 2019, SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data)

Statistical knowledge and domain expertise are key to extract actionable insights out of data, yet such skills rarely coexist together. In Machine Learning, high-quality results are only attainable via mindful data preprocessing, hyperparameter tuning and model selection. Domain experts are often overwhelmed by such complexity, de-facto inhibiting a wider adoption of ML techniques in other fields. Existing libraries that claim to solve this problem, still require well-trained practitioners. Those frameworks involve heavy data preparation steps and are often too slow for interactive feedback from the user, severely limiting the scope of such systems. In this paper we present Alpine Meadow, a first Interactive Automated Machine Learning tool. What makes our system unique is not only the focus on interactivity, but also the combined systemic and algorithmic design approach; on one hand we leverage ideas from query optimization, on the other we devise novel selection and pruning strategies combining cost-based Multi-Armed Bandits and Bayesian Optimization. We evaluate our system on over 300 datasets and compare against other AutoML tools, including the current NIPS winner, as well as expert solutions. Not only is Alpine Meadow able to significantly outperform the other AutoML systems while --- in contrast to the other systems --- providing interactive latencies, but also outperforms in 80% of the cases expert solutions over data sets we have never seen before.
more » « less
Full Text Available
Towards Interactive Curation & Automatic Tuning of ML Pipelines

https://doi.org/10.1145/3209889.3209891

Binnig, Carsten; Buratti, Benedetto; Chung, Yeounoh; Cousins, Cyrus; Kraska, Tim; Shang, Zeyuan; Upfal, Eli; Zeleznik, Robert; Zgraggen, Emanuel (January 2018, Proceedings of the Second Workshop on Data Management for End-to-End Machine Learning)

Democratizing Data Science requires a fundamental rethinking of the way data analytics and model discovery is done. Available tools for analyzing massive data sets and curating machine learning models are limited in a number of fundamental ways. First, existing tools require well-trained data scientists to select the appropriate techniques to build models and to evaluate their outcomes. Second, existing tools require heavy data preparation steps and are often too slow to give interactive feedback to domain experts in the model building process, severely limiting the possible interactions. Third, current tools do not provide adequate analysis of statistical risk factors in the model development. In this work, we present the first iteration of QuIC-M (pronounced quick-m), an interactive human-in-the-loop data exploration and model building suite. The goal is to enable domain experts to build the machine learning pipelines an order of magnitude faster than machine learning experts while having model qualities comparable to expert solutions.
more » « less
Full Text Available

Search for: All records